Genome of the HIV-1 inter-subtype (C/B&#39;) and use thereof

ABSTRACT

The present invention refers to a polynucleotide comprising the nucleic acid sequence as depicted in SEQ ID NO:1, 2 or 3 or the fragment or derivative thereof, or a polynucleotide hybridizing with the nucleic acid sequence as depicted in SEQ ID NO:1, 2 or 3. The present invention further refers to polypeptides encoded by the nucleic acid sequence or the fragment or derivative thereof as depicted in SEQ ID NO:1, 2 or 3. The polynucleotides and polypeptides may be used as medicaments, vaccines or diagnostic substances, preferably for the treatment, prevention or diagnostic of HIV infections.

This application claims priority to PCT/DE 00/04073, filed on Nov. 16, 2000, and German DE 199 55 089.1, filed Nov. 16, 1999, and is a divisional of U.S. Ser. No. 10/130,157 filed Aug. 13, 2002. The entire text of the above-referenced applications are specifically incorporated herein by reference without disclaimer.

The present invention refers to a polynucleotide comprising the nucleic acid sequence as depicted in SEQ ID NO:1, 2 or 3 or the fragment or derivative thereof, or a polynucleotide hybridizing with the nucleic acid sequence as depicted in SEQ ID NO:1, 2 or 3. The present invention further refers to polypeptides encoded by the nucleic acid sequence or the fragment or derivative thereof as depicted in SEQ ID NO:1, 2 or 3. The polynucleotides and polypeptides may be used as medicaments, vaccines or diagnostic substances, preferably for the treatment, prevention or diagnostic of HIV infections.

Regarding the extent of the global distribution of the Human Immunodeficiency Virus (HIV) pandemia with an estimated number of more than 40 million infected people worldwide by the end of this century and more than 90 percent thereof living in developing countries, the development of an HIV vaccine is considered to be one of the major challenges of modern industrialized societies. However, so far the development of a successful HIV vaccine is still limited by the complicated biology of the virus and its complex interaction with the host's immune system. Those few candidate vaccines, that have been tested to date in developing countries in clinical phase 3 trials, were majorly based on the HIV type 1 external glycoprotein gp120 or gp160. However, the outcome of these studies was somewhat disappointing in that the vaccines not only failed to induce broadly cross neutralizing antibody and T cell responses, but could not even prevent breakthrough infections that have been reported for some of the vaccinated individuals. One of the reasons for that is certainly the extensive sequence variation between the used antigens that were derived from lab adapted virus strains and the genetically divergent viruses circulating throughout the testing areas such as Thailand.

Phylogenetic analysis of globally circulating HIV strains have identified a major group (M) of 10 different sequence subtypes (A-J) (Kostrikis et al. 1995; Leitner and Albert, 1995; Gaywee et al. 1996; World Health Organisation Network for HIV Isolation and Characterization, 1994) exhibiting sequence variations in the envelope protein up to 24% in addition to group O viruses, that differ from group M viruses by more than 40% in some reading frames (Loussert Ajaka et al. 1995; Myers et al. 1996; Sharp et al. 1995; Sharp et al. 1999). HIV evolves by the rapid accumulation of mutation and intersubtype recombination. Different subtypes cocirculating in the population of a geographic region represent the molecular basis for the generation and distribution of interclade mosaic viruses. Although the global HIV-1 variants have been studied intensively by means of serology and heteroduplex DNA analysis, most phylogenetic studies are based on envelope sequences, because many of the prevalent subtypes and a variety of recombinant forms lack fully sequenced genomes.

Non-subtype B viruses cause the vast majority of new HIV-1 infections worldwide. Among those, clade C HIV-1 strains play a leading role both regarding the total number of infected people as well as the high incidence of new infections especially in South America and Asia. For that reason, characterization of clade C viruses is one of the top priorities both for diagnostic, preventive and therapeutic purposes.

With the exception of Thailand, limited information has been available until recently regarding the distribution and molecular characteristics of HIV-1 strains circulating throughout Asia. WHO estimates that South and Southeast Asia have the most rapid rate of HIV spread and will soon become the world's largest HIV epidemic region. China has very similar social and economic conditions and direct ethnic and economic connections to these regions. Since early 1995, a rapid increase of HIV infection was clearly seen in many provinces of China. Compared with accumulated 1774 cases of HIV and AIDS detected from 1985 to 1994, 1421 cases were detected in 1995 and more than 4000 cases in 1997 alone. The WHO estimated more than 400.000 HIV infections in China by the end of 1997, with estimated 6400 cumulative deaths and 4000 people dying of AIDS in 1997 alone. In the recent national HIV molecular epidemiology survey, it was found that the subtype prototype B and B′-subtype Thai strains in Yunnan, a southwestern province of China bordering the drug triangle of Myanmar, Laos and Thailand (Graf et al. 1998) were spread to central and eastern China by drug users, contaminated blood and plasma collection services. The second epidemic was imported to the same area most probable by Indian IDUs carrying subtype C strains in the early 1990s (Luo et al. 1995; Shao et al. 1999). Within a few years, subtype C viruses spread rapidly in South, Central and even in Northwest China by drug trafficking and caused a wide spread epidemic in China. According to a recent Chinese nationwide HIV molecular epidemiology survey, almost all the individuals infected with subtype C are IDUs and covered about 40% of HIV infected IDUs in China, suggesting subtype C virus to be one of the major HIV-1 subtypes prevalent among IDUs in China (Shao et al. 1998; Shao et al. 1994).

This suggests that the HIV epidemic among IDUs in China extended from a single predominant subtype (B) within a few years to at least two predominant subtypes, B-Thai and C, increasing the possibility of the intersubtype recombination. According to our present knowledge on the variability and the antigenicity of different virus strains, diagnostic tools, therapeutic agents and vaccines should be adapted to local virus strains. However, the number of molecular reagents for non-subtype B viruses is still extremely limited. Currently, there are only few non-recombinant molecular clones and few mosaic genomes available for viruses other than B or C. Regarding lade C HIV-1 viruses, only non-recombinant representatives and 4 A/C recombinants are published so far, all of them originating from Africa, South America or India (Luo et al. 1995; Gao et al. 1998; Lole et al. 1999). Furthermore, all of the previous data on subtype C viruses in China was limited to the genetic subtyping of the env gene (Luo et al. 1995; Yu et al. 1997; Salminen et al. 1995).

Several clinical trials against HIV infections have been performed with vaccines so far. The disappointing results that were observed in clinical trials include repeatedly reported breakthrough infections in the vaccinized people. This outcome has been attributed to major sequence variations between the administered envelope proteins and the infectious input virus, which in fact was primarily due to an insufficient characterization of the viruspopulation circulating in a distinct geographic region. This resulted in the generation of humoral and—to a lesser extent—cell mediated immune responses towards viral antigens, that were not relevant for the viruses circulating throughout the population in the test field. Moreover, low affinity binding, envelope specific antibodies have been reported, not only to lack neutralizing capacity but even contribute to an enhancement of infection via complement- or Fc-receptors. Furthermore, the selected antigens and delivery systems turned out to be extremely weak inducers of the cell mediated immune response.

In view of a lack of precise knowledge on cross-clade protective immune responses and regarding the complex situation in developing countries, where multiple subtypes of HIV-1 are known to cocirculate, vaccine preparations should include mixtures of representative antigens. Thus, there is a need for isolation and characterization of clade C viruses, especially for cloning the coding region.

The problem of the invention is solved by the subject matter of the claims.

The present invention is further illustrated by the figures.

FIG. 1 shows an illustration of the phylogenetic relationship of the env gene C2V3 coding region from clone 97cn54 with the representatives of the major HIV-1 (group M) subtypes. cn-con-c represents the env consensus sequence of HIV-1 subtype C strains prevalent in China. Phylogenetic tree was constructed using the neighbour joining method. Values at the nodes indicate the percent bootstraps in which the cluster to the right was supported. Bootstraps of 70% and higher only are shown. Brackets on the right represent the major subtype sequences of HIV-1 group M.

FIG. 2 shows an illustration of the Recombinant Identification Program analysis (RIP, version 1.3) of the complete gagpol coding region of 97cn54 (window size: 200, threshold for statistical significance: 90%, Gap handling: STRIP). Positions of the gag and pol open reading frames are indicated by arrows on top of the diagram. Rip analysis was based on background alignments using reference sequences derived from selected virus strains representing the most relevant HIV-1 subtypes. The standard representatives are marked by different colors as indicated. The x axis indicates the nucleotide positions along the alignment. The y axis indicates the similarity of the 97cn54 with the listed reference subtypes.

FIG. 3 shows an illustration of the phylogenetic relationship of different regions within the 97cn54 derived gagpol reading frames with standard representatives of the major HIV-1 (group M) subtypes. Phylogenetic trees were constructed using the neighbour joining method based on the following sequence stretches: (A) nucleotides 1-478, (B) 479-620, (C) 621-1290, (D) 1291-1830, (E) 1831-2220, (F) 2221-2520 and (G) 2521-2971. Given positions refer to the first nucleotide of the gag open reading frame. Grey areas highlight clustering of the analyzed sequences either with clade C- (A, C, E, G) or B- (B, D, F) derived reference strains. Values at the nodes indicate the percent bootstraps in which the cluster to the right was supported. Bootstraps of 70% and higher only are shown.

FIG. 4 shows an illustration of the Recombinant Identification Program analysis (RIP, version 1.3) of different regions of 97cn54 (window size: 200, threshold for statistical significance: 90%, Gap handling: STRIP). Analysis included (A) a sequence stretch of 1500 bp from the start codon of the vif gene to the 5′ end of env, including vif, vpr, first exon of tat and rev, vpu and first 200 bp of env gene and (B) an about 700 bp fragment overlapping 300 bp from the 3′ end of env encompassing the complete nef gene and parts of the 3′ LTR. Positions of the start codons vpr, tat, vpu, env, nef as well as the 5′ end of the 3′-LTR are indicated by arrows on top of the diagrams, respectively. Rip analysis was based on background alignments using sequences derived from selected virus strains representing the most relevant HIV-1 subtypes. The indicated standard representatives are marked by different colors. The x axis indicates the nucleotide positions along the alignment. The y axis indicates the similarity of the 97cn54 with the listed reference subtypes. (C) and (D) Rip analysis of sequences from two independent clade C-isolates (xj24 and xj158) from China overlapping the vpr and vpu genes including the first exon of tat.

FIG. 5 shows a phylogenetic tree analysis. Phylogenetic trees were constructed using the neighbour joining method based (A) on a 380 bp fragment overlapping the 3′ 150 bp of the vpr gene to the end of the vpu reading frame, (B) on the first 290 bp of the nef coding region and (C) on the 3′ 320 bp of the nef gene. Values at the nodes indicate the percent bootstraps in which the cluster to the right was supported. Bootstraps of 70% and higher only are shown. Brackets on the right represent the major subtype sequences of HIV-1 group M

FIG. 6 is an illustration of the schematic representation of the mosaic genome organization of 97cn54.

FIG. 7 is an illustration of the comparison between known and experimentally proven prototype B (HIV-1_(LAI)) derived CTL epitopes and the corresponding amino acid sequences in the gag, pol and env polypeptides of the clade C strain 97cn54. Functional domains in GAG (p17 matrix, p24 capsid, p15 nucleocapsid and linker protein), POL (PR protease, RT reverse transcriptase, IN integrase) and ENV (gp120 external glycoprotein, gp41 transmembrane protein) are indicated. Numbers underneath the open reading frames indicate amino acid position relative to the aminotermini of the polypeptides, respectively. Haplotype restrictions of the known HIV-1_(LAI) derived CTL epitopes are indicated at the left and right margin respectively. Green bars represent sequence identity between the known epitope and the corresponding clade C sequence, blue bars indicate 2 or less conservative mismatches. Red bars represent clade C derived sequence stretches with more than 2 conservative mismatches or any nonconservative substitution as compared to the corresponding LAI derived epitope.

FIG. 8 shows the full length coding nucleotide sequence of dade C HIV-1 97cn54 (SEQ ID NO:1) with the corresponding amino acids in one letter code. All three reading frames are given (reading frame “a” (SEQ ID NOS:27-124; reading frame “b” (SEQ ID NOS:126-251; reading frame “c” (SEQ ID NOS:253-379)). The asterisks present stop codons.

FIG. 9 shows in an illustration the result of the activities of cytotoxic T cells in mouse BALB/c spleen cells after intramuscular immunization with the respective DNA plasmids. Lymphoid cells obtained 3 weeks after a primary immunization from 5 mice each per group were co-cultured with syngenic P815 mastocytoma cells (irradiated with 20,000 rad) loaded with a gag polypeptide having the amino acid sequence AMQMLKETI (SEQ ID NO:380). Controls included spleen cells from non-immunized mice which were stimulated with peptide loaded P815 cells. Cytotoxic effector cell populations were harvested after 5 days of culture in vitro. The cytotoxic responses were read against A20 cells loaded with the above mentioned nonameric peptide or against unloaded A20 cells in a ⁵¹C release standard assay. The shown data represent the mean value from approaches performed three times each. The determined standard deviations were each lower than 15% measured with regard to the mean value.

The term “epitope” or “antigenic determinant” as used herein refers to an immonulogical determinant group of an antigen which is specifically recognized by an antibody. An epitope may comprise amino acids in a spatial or discontinuous confirmation comprising at least 3, preferably at least 5 amino acids. An epitope may also comprise a single segment of a polypeptide chain comprising a continuous amino acid sequence.

The term “polynucleotide” as used herein refers to a single-stranded or double-stranded heteropolymer of nucleotide units of any length, either of ribonucleotides or deoxyribonucleotides. The term also includes modified nucleotides.

The term “derivative” as used herein refers to a nucleic acid also coding the one or more polypeptide(s) which is or are coded by another nucleic acid sequence although its nucleic acid sequence differs from the other nucleic acid sequence. In this sense the term “derivative” refers also to equivalents of the other nucleic acid sequence which exists because of the degeneration of the genetic code. Thus, the term derivative includes e.g. nucleic acids coding the same polypeptides as the nucleic acids according to SEQ ID NO: 1, 2 or 3 but having another nucleic acid sequence. Furthermore, the term includes nucleic acid fragments coding the same polypeptide as the nucleic acid fragments of the nucleic acid sequence according to SEQ ID NO: 1, 2 or 3.

The term “polypeptide” as used herein refers to a chain of at least two amino acid residues connected by peptide linkages. The term comprises, therefore, any amino acid chains, e.g. oligopeptides and proteins. The term also refers to such amino acid chains wherein one or more amino acid(s) is(are) modified, e.g. by acetylation, glycosylation or phosphorylation.

The term “continuous sequence” or “fragment” as used herein refers to a linear nucleotide or amino acid stretch derived from a reference sequence, e.g. the sequences of the present invention set forth in the sequence listing.

The term “selective hybridization” or “selectively hybridizable” as used herein refers to hybridization conditions wherein two polynucleotides form duplex nucleotide molecules under stringent hybridization conditions. Those conditions are known in the state of the art and are set forth e.g. in Sambrook et al., Molecular Cloning, Cold Spring Harbour Laboratory (1989), ISBN 0-87969-309-6. Examples for stringent hybridization conditions are: (1) hybridization in 4×SSC at 65° C. or (2) hybridization in 50% formamide in 4×SSC at 42° C., both followed by several washing steps in 0.1×SSC at 65° C. for 1 hour.

The term “viral vector” or “bacterial vector” as used herein refers to genetically modified viruses or bacteria useful for the introduction of the DNA sequences according to SEQ ID NO: 1, 2 or 3 or derivatives, fragments, sequences thereof coding for epitopes or epitope strings into different cells, preferably into antigen presenting cells, e.g. dendritic cells. In addition, a bacterial vector may be suitable to directly express a polypeptide encoded from SEQ ID NO:1, 2 or 3 or derived epitopes or epitope strings therefrom.

One aspect of the present invention refers to a nucleotide sequence as depicted in SEQ ID NO:1, SEQ ID NO:2 or SEQ ID NO:3. In order to gather necessary information on representative and virtually full-length viral genomes, a molecular epidemiology study was first conducted among more than hundred HIV-1 subtype C seropositive intravenous drug users (IDUs) from China. Genotyping based on the constant region 2 and variable region 3 (C2V3) within the viral envelope glycoprotein gene revealed highest homology of the most prevalent virus strains circulating throughout China to subtype C sequences of Indian origin. Based on these results a virtually full length genome representing the most prevalent class of clade C strains circulating throughout China was amplified and subcloned from peripheral blood mononuclear cells (PBMCs) of a selected, HIV infected IDU. Sequence analysis identified a mosaic structure suggesting extensive intersubtype recombination events between genomes of the prevalent clade C and (B′)-subtype Thai virus strains of that geographic region. RIP (Recombinant Identification Program) analysis and phylogenetic bootstrapping suggested altogether ten break points (i) in the gagpol coding region, (ii) in vpr and at the 3′ end of the vpu gene as well as (iii) in the nef open reading frame. Thai (B′)-sequences therefore include (i) several insertions in the gagpol coding region (nucleotides 478-620, 1290-1830, 2221-2520, referring each to the first nucleotide within the start codon of the gag and gagpol reading frame, respectively), (ii) 3′-vpr, complete vpu, the first exons of tat and rev (approx. 1000 nucleotides starting from nucleotide 138 referring to the start codon of the Vpr reading frame) as well as (iii) the 5′ half of the nef gene (nucleotides 1-300). The remainder of the parts within the sequence comprising 9078 nucleotides (SEQ ID NO:1, table 3) show highest homologies to the known subtype C isolates. Breakpoints located in the vpr/vpu coding region as well as in the nef gene of 97cn54 were found at similar positions of many subtype C strains isolated from IDUs living in different areas of China suggesting a common ancestor for the C/B′ recombinant strains. More than 50% of well-defined subtype B-derived CTL epitopes within Gag and Pol and 10% of the known epitopes in Env were found to exactly match sequences within in this clade C/B′ chimeric reference strain. These results may substantially facilitate vaccine-related efforts in China by providing highly relevant templates for vaccine design and developing reagents for the most appropriate immunological/virological readouts.

The use of the described HIV-1 sequence of the present invention representing the most prevalent C type virus strain of China as a basis and source is of advantage for the development of preventive or therapeutic vaccines. Necessary consequences for the development of a successful HIV candidate vaccine is (i) a detailed knowledge of the respective epidemiological situation and (ii) the availability of a cloned coding sequence representing the most prevalent virus strain within a geographic region or distinct population. Such sequences represent the basis (i) for the rational design of preventive and therapeutic applicable HIV candidate vaccines, (ii) for the development of specific therapeutic medicaments e.g. therapeutic effective decoy oligonucleotides and proteins, antisense constructs, ribozyme and transdominat negative effective mutants, (iii) for the development of lentiviral vectors for gene therapy and (iv) for the production of reagents which may be utilized for diagnostics and monitoring of HIV infections and for immunological/viral monitoring of the vaccination process.

This is especially true for candidate vaccines that are based on the HIV envelope proteins, which were shown to be most variable among all HIV proteins. Besides that, a successful vaccine will have to induce most probably both arms of the immune system: neutralizing antibodies directed ideally to conformational epitopes in the envelope protein as well as cell mediated immune responses (CD4 positive T-helper cells, CD8 positive cytolytic T-cells, Th1 type cytokines, β-chemokines) generated against epitopes of different viral proteins. The conformational epitope according to the present invention consists of at least 3 amino acids involved in the antibody binding and preferably of 5 or more amino acids. Conformational epitopes may also consist of several segments either of a single protein or—in case of oligomer complexes e.g. of the trimeric glycoprotein envelope complex—of several segments of different subunits. A linear epitope according to the present invention normally varies in length comprising from at least 8 amino acids to about 15 amino acids or longer, preferably comprising 9 to 11 amino acids, in particular in case of MHC class I restricted CTL epitopes.

Thus, the present invention further relates to polypeptides encoded by the nucleic acid sequence or fragment or derivative of the nucleic acid sequence according to SEQ ID NO:1, 2 or 3. The present invention further relates to polypeptides comprising a continuous sequence of at least 8 amino acids encoded by the nucleic acid sequence or fragments or derivatives of the nucleic acid sequence according to SEQ ID NO:1, 2 or 3. Preferably the polypeptide of the present invention comprises an antigenic determinant causing naturally an immune reaction in infected subjects. More preferred are polypeptides comprising an amino acid sequence encoded from the nucleic acid sequence according to SEQ ID NO:2 or 3 or the fragment or derivative thereof. Most preferred are epitopes comprising a continuous region of 9 to 11 amino acids which are identical to the polypeptides encoded by SEQ ID NO:1 and a HIV-1_(LAI) reference isolate, or which consist of 2 or less conserved amino acid substitutions within the sequence comprising 9 to 11 amino acids. Examples for such epitopes are given in example 11. The polypeptides of the present invention may be used e.g. as vaccines and therapeutic substances or for diagnostics.

A further aspect of the present invention relates to a polynucleotide according to SEQ ID NO:1, 2 or 3. The present invention further relates to a polynucleotide fragment of the nucleotide sequence according to SEQ ID NO:1, 2 or 3 or to a polynucleotide comprising at least one continuous sequence of nucleotides capable of selectively hybridizing to the nucleotide sequence as depicted in SEQ ID NO:1, 2 or 3. Further, the present invention relates to derivatives of the polynucleotides or polynucleotide fragments of the present invention. Preferably the polynucleotide or the polynucleotide fragment comprises a continuous sequence of at least 9 nucleotides, preferably at least 15 nucleotides, more preferably at least 27 nucleotides, or longer. The polynucleotide or the polynucleotide fragment may also comprise the coding region of the single HIV genes, e.g. gag, pol, env. Examples are set forth in SEQ ID NO:2 and SEQ ID NO:3. Another aspect of the present invention relates to a polynucleotide comprising at least two polynucleotide fragments of the present invention wherein the sequences of the polynucleotide fragments can overlap or can be separated by a nucleotide sequence spacer. The sequences of the polynucleotide fragments may be identical or different. The polynucleotides or polynucleotide fragments of the present invention can be used as vaccines or therapeutic substances or for diagnostics.

The cloned clade C HIV-1 97cn54 coding sequence and derivatives thereof according to SEQ ID NO:1 can be used as the basis for the following applications:

Development of clade-C specific HIV-1 vaccines for therapeutic and preventive purposes. These clade-specific vaccines can be used worldwide in all geographic regions, where clade C virus strains play a major role in the HIV epidemic such as e.g. in Latin America, in Africa as well as in Asia. More specifically, HIV vaccines to be tested in and developed for Southeast Asia and China should be based on the described 97cn54 coding sequence in order to induce subtype specific humoral and cell mediated immune responses. Furthermore, such clade-C specific HIV-1 vaccines can be used as a component in a cocktail vaccine considering either all or a defined selection of the relevant worldwide HIV subtypes.

The antigens or coding sequences to be delivered to the immune system include (i) short continuous stretches from at least 3 to about 5 amino acids or longer stretches derived from one of the open reading frames depicted in table 3, (ii) stretches of preferably 9 to 11 amino acids, (iii) combinations of these stretches delivered either separately or as polypeptide strings (epitope strings) wherein the epitope strings and their amino acid sequences, respectively, either overlap or may be separated by amino acid or other spacer, and most preferably complete proteins or the corresponding coding sequences or variations thereof that may also include extended deletions in order to induce proper humoral and cell mediated immune responses in the vaccinees. Therefore, another object of the present invention relates to polypeptides which are encoded by the nucleotide sequence or fragments thereof depicted in SEQ ID NO:1, SEQ ID NO:2 and SEQ ID NO:3. Preferably the polypeptide comprises a continuous sequence of at least 8 amino acids, preferably at least 9 to 11 amino acids, more preferably at least 15 amino acids or longer sequences or discontinuous epitopes preferably composed of at least three amino acids of a single polypeptide chain or in case of oligomer protein complexes of different polypeptide chains. Vaccine constructs on the basis of the 91cn54 coding sequence include all antigenic forms known in the state of the art and include all known delivery systems.

Short epitopes encoded by fragments of the nucleic acid sequences according to SEQ ID NO:1 to 3 and comprising at each case 3 to 5 amino acids, preferably 9 to 11 or more amino acids, can preferably produced synthetically. Such peptides consist of either a B cell epitope, a MHC class II restricted T helper epitope, a MHC class I restricted cytotoxic T cell epitope or a combination of the mentioned variants. Individual epitopes may overlap or are separated by a spacer, preferentially consisting of glycine and/or serine moieties. Branched peptides may according to the state of the art either be generated during the synthesis or by means of the known and commercially available homo and hetero bifunctional chemical crosslinkers after the synthesis and purification of the respective peptides. Alternatively, per se little immunogenic peptides may conjugated to selected carrier proteins e.g. ovalbumin by crosslinking, inserted genetically into carrier proteins or fused to their N and C termini, respectively. Preferably, such carrier proteins are able to form particular structures in which B cell epitopes are lying preferably on the surface of the particular carrier (i) during expression in suitable cell culture systems (see below) or (ii) after suitable back folding of the purified denatured protein. Numerous examples of polypeptides inclining to the formation of particular structures are known in the meantime e.g. the Hepatitis B virus (HBV) core antigen (HBcAg), the HBV surface protein (HBsAg), the HIV group specific antigen, the polyomavirus VP1 protein, the papillomavirus L1 protein or the yeast TyA protein. Due to the fact that the majority of the so far described particle forming proteins are derived from the capsid or structural proteins of different viruses they are also named virus like particles; see special edition Vaccine. (1999), Vol. 18, Advances in, Protein and Nucleic Acid Vaccine Strategies. edited by Pof. P. T. P. Kauyama.

Epitope strings and polypeptides encoded by fragments of the nucleic acid sequences according to SEQ ID NO: 1 to 3 having a length of more than 30, preferably more than 50 amino acids, and polypeptides having a tendency to form particular structures (VLP) can be produced and purified in prokaryotes by means known in the state of the art. The plasmids include accordingly a bacterial origin of replication such as ColEl, generally a selection marker such as a resistance against kanamycin or ampicillin, a constitutive active or inducible transcription control unit such as the LacZ or Tac promotor, and translation start and stop signals. For a simplified expression and affinity purification optionally separatable fusion parts and purification means such as glutathion-S-transferase or oligohistidin tags may be used.

The DNA or RNA sequences used (i) for the production of said epitope strings, complete proteins or virus like structures in eukaryotic cell cultures such as yeast cells, fungi, insect cells or mammalian cells or (ii) for the direct delivery of DNA for immunization purposes may rely on a codon usage that is utilized by the virus itself. Alternatively, where ever technically feasible the codon usage may be adapted to that of most or second most frequently used codons in genes being highly expressed in the respective production system. Examples for the optimization of the codon usage in a polygene optimized for security aspects including the genes Gag, Pol and Nef as well as in the envelope gene are set forth in SEQ ID NO:2 and 3. The SEQ ID NO: 2 and 3 are more specified in example 15.

The establishing of cell lines to produce epitope strings, polypeptides or virus like particles in the mentioned cell culture systems may be based on vectors according to the state of the art. Said vectors again may include a bacterial origin of replication, a positive or negative selection marker and primarily the respective control regions for the normal transcription and translation of the foreign protein. The subsequent described components of the DNA vaccine constructs represent exemplary also those modules which are found in vectors to express epitope strings, polypeptides or complete proteins in different mammalian cell cultures.

The simplest form of the immunization is the direct application of a pure DNA vaccine. Said vaccine includes essentially 5′ of the coding region a transcription control region also called promotor/enhancer region optionally followed by a functional intron to enhance the gene expression, (ii) a Kozak consensus sequence including a translation start codon as well as a translation termination codon followed by a polyadenylation signal at the 3′ end of the foreign gene. Preferentially, the promotor/enhancer region may support the constitutive expression of the desired gene product and is derived e.g. from the transcription control region of a cytomegalovirus immediate early gene (CMV-IE) or the Rous sarcoma virus long terminal repeat (RSV-LTR). Alternatively, an inducible form of a transcription control region may be used such as a Tet on/Tet off promotor regulating the transcription e.g. by the application of tetracycline or respective analoga. Furthermore, the use of cell type specific regulated transcription control regions is advantageous e.g. the upstream of the muscle creatin kinase gene (MCK gene; muscle specific expression) or of the CD4 receptor gene or the MHC class II gene (preferential expression in antigen presenting cells) positioned promotor/enhancer regions. In some cases also chimeric combinations from (i) cell type specific promotors and (ii) viral enhancer regions are used to combine the advantages of a tissue specific expression with those of a strong transcription activity of viral enhancers. The enhancement of the gene expression by integration of a functional intron positioned normally 5′ of an open reading frame is due to an enhanced export rate from the nucleus of spliced transcripts in comparison to unspliced and is obtained by the insertion of an intron positioned in the β-globin gene.

A preferred DNA vaccine based on SEQ ID NO:1, 2 or 3 in addition includes a replicon derived from alpha viruses such as Semliki-Forest (SFV) or Venezuela-Encephalitis virus (VEE). Here, the aforementioned nuclear transcription control region and the optionally considered intron follow first the coding region for the VEE or SFV non structural proteins (NS). Only 3′ follows the real foreign gene whose cytoplasmic transcription is regulated by a NS sensitive promotor. Correspondingly, a long transcript over several open reading frames is generated starting from the nuclear transcription control unit and is then translocated into the cytoplasm. The NS proteins synthesized here then activate the cytoplasmic transcription of the foreign genes by binding to the respective control region. This amplification effect normally leads to an abundant RNA synthesis and hence to a high synthesis rate of the foreign protein. The latter normally allows a significant reduction of the plasmid amount to be administered with at least comparable immunogenicity in direct comparison with conventional plasmids which give up the described effect of cytoplasmic RNA amplification.

The afore described peptides, proteins, virus like particles and DNA constructs can be administered by intramuscular, subcutaneous, intradermal, intravenous injection, whereby the respective prior art is used for the administration of the proteinecous antigens. Either conventional syringes with injection needles or means without needles normally introducing the DNA by air pressure directly into the desired tissue may be used for the DNA immunization. This comprises in particular also the intranasal and oral application of DNA containing vaccine formulations by spray-type means. Alternatively, the DNA can also be conjugated to solid supports such as gold beads and be administered via air pressure into the respective tissues.

To enhance or modulate the immune response the mentioned proteinecous antigens and DNA constructs may be administered in combination or in sequential chronology with so called adjuvants which are nominally stimulators of the immune response. Conventional adjuvants such as aluminium hydroxide or aluminium hydroxyphosphate result in a stimulation of the humoral immune response showing a high antibody titer of the IgGI subtype. More modem adjuvants such as CpG oligonucleotides (consensus core sequence: purine-purine-CpG-pyrimidine-pyrimidine) or chemically modified derivatives thereof (phosphorothioate oligonucleotides, oligonucleotides with a peptide backbone) usually enhance the cellular arm of the immune response and support primarily the cell mediated immunity of the Th1 type, which is characterized by a high antibody titer of the IgG2a subtype and the induction of Th1 cytokines such as γ-IFN, IL-2 and IL-12.

The administration and uptake of peptides, proteins and DNA vaccine constructs can be improved in particular by binding to or incorporation into higher molecular structures such as biodegradable particles, multilamellar, preferably cationic liposomes, immune stimulating complexes (ISCOMS), virosomes or in vitro assembled virus particles. Said biodegradable particles are e.g. PLA- (L-lactic acid), PGA- (polyglycolic) or PLGA-[poly (D,L-lactide-co-glycolide)] microspheres or derivatives thereof, cationic microparticles or carrier substances derived from bacterial polysaccharide capsules. The collective term ISCOMS designates immune stimulating complexes derived from water soluble extracts from the bark of Quillaja saponaria and are purified by chromatographic methods. A detailed summary of the prior art of the various adjuvants and administration means is given in www.niaid.gov/aidsvaccine/pdf/compendium.pdf [Vogel, F. R., Powell, M. F. and Alving, C. R., A Compendium of Vaccine Adjuvants and Excipients (2nd Edition)].

Furthermore, viral and alternatively bacterial vectors may be used for a suitable presentation of epitope strings, polypeptides and virus like particles.

According to the current state of the art e.g. genetically modified salmonellae and listeriae are suited preferably due to their natural cell tropism to introduce DNA vaccine constructs into antigen presenting cells like monocytes, macrophages and primarily into dendritic cells. Besides the benefit of cell type specificity the genetic modifications can contribute to the fact that the DNA can enter the cytoplasm of the antigen presenting cell without damage. In this case a DNA vaccine construct enters the cell nucleus where the respective reading frame is transcribed via an eukaryotic preferably viral or cell type specific promoter with use of the cellular resources and proteins. The respective gene product is translated after the transport of the RNA into the cytoplasm and is according to the respective conditions modified posttranslationally and assigned to the respective cellular compartment.

Bacterial vectors (salmonellae, listeriae, yersiniae etc.) may be used also for the induction of a mucosal immunity preferably after an oral administration. The respective antigens are produced by the bacterial transcription and translation machinery thereby and is therefore not subject to the posttranslational modifications usually present in mammalian cells (no respective glycolysation; no secretory pathway).

In addition, a plurality of attenuated viral vectors exist now which are helpful in expressing the desired antigens successfully and in high yields. Such viral vectors can be used directly for the immunization besides their capability of the mere antigen production. Said production may take place firstly either ex vivo e.g. for the infection of antigen presenting cells administered subsequently to the vaccine, or directly in vivo by subcutaneous, intradermal, intracutaneous, intramuscular or intranasal immunization with the recombinant virus resulting in a beneficial antigen presentation with the respective immunization success. Thus, exemplary adequate humoral and cell mediated immune responses may be induced in the vaccinated subjects by immunization with recombinant vaccine viruses such as Modified Vaccinia Ancara virus (MVA) attenuated by passage through chicken cells, the genetically attenuated vaccinia type New York (NYVAC) or the in birds endemic aviary vaccinia viruses (Fowlpox, Canaypox). Alternatively, several other viruses are also qualified e.g. recombinant alpha-viruses, e.g. the Semliki-Forest virus or the Venezuela-Enzphalitis virus, recombinant adenoviruses, recombinant Herpes simplex viruses, influenza viruses and others.

Finally, also attenuated HIV viruses may be generated based on SEQ ID NO:1, 2 or 3 and used for immunization purposes, if the regulation sequences (LTR, long terminal repeat) flanking the coding part are supplemented according to the prior art by cloning methods. A sufficient attenuation can subsequently be obtained by one or more deletions e.g. in the nef gene according to the prior art.

The nucleic acid sequences depicted in SEQ ID NO:1 and 3 as well as the derived peptides, proteins and virus like particles therefrom may also be used as components of viral vectors for the gene transfer.

The polypeptides encoded by the GagPol gene (SEQ ID NO:1, nucleotide 177-4458; table 3) can e.g. provide the packaging and receptor functions of e.g. lentiviral or retroviral vectors. Virus particles may be generated which are also able to transduce resting postmitotic or finally differentiated cells e.g. after transient transfection of mammalian cells by suitable plasmid vectors which support the simultaneously expression of the GagPol and VSV-G (vesicular stomatitis virus envelope protein) gene and ensure the packaging of the therapeutic transgene. Said method to generate transduction competent virus particles can significantly be facilitated and efficiently be configured e.g. by establishing of stable cell lines e.g. based on human embryonic kidney cells (HEK293) which express the GagPol polyprotein constitutively or under the control of an inducible promotor. Alternatively, recombinant adenoviruses may be generated, which encode the packaging functions, the receptor functions and the transgene function or combination thereof and, thus, serve as a tool for the ex vivo, in situ and in vivo delivery of retroviral or lentiviral vectors.

The envelope proteins or derivatives thereof encoded by SEQ ID NO:3 can provide the receptor function for lentiviral, spumaviral or retroviral vectors or other vectors based on coated viruses by incorporation in the lipid bilayer. For this purpose e.g. packaging cell lines may be generated which either express the GagPol proteins from retroviruses, spumaviruses and preferably lentiviruses as well as the envelope proteins derived from SEQ ID NO:1 and 3 constitutively or under the control of an inducible, alternatively, a regulable promotor. Alternatively, chimeric viruses based on the genome of type C or D retroviruses or other membrane coated viruses such as influenza virus or herpes virus may be generated carrying on the surface in addition to or instead of the naturally occurring envelope protein an envelope protein derived from SEQ ID NO:1 or 3.

Against the peptides, proteins or virus like particles derived from SEQ ID NO:1 to 3 (i) polyclonal antisera, (ii) monoclonal antibodies (murine, human, camel), (iii) antibody derivatives such as single-chain antibodies, humanized antibodies, bi-specific antibodies, antibody phage libraries or (iv) other high affinity binding polypeptides such as derivatives of the hPSTI (human pancreatic secretory trypsine inhibitor) may be generated. Said reagents may be used for therapeutic purposes e.g. for the treatment of HIV infections or for diagnostic purposes e.g. for the production of test kits.

Similarly, analogous peptides, proteins or nucleic acid sequences derived from SEQ ID NO:1, 2 or 3 may be used for diagnostic purposes e.g. for serodiagnosis, applying nucleic acid hybridization technologies, employing nucleic acid amplification systems or combinations thereof. Preferably, the polynucleotide fragments of the nucleotide sequence according to SEQ ID NO:1 according to the invention may be used in a polymerase chain reaction. Particularly preferred is the use of the polynucleotide fragments of the nucleotide sequence according to SEQ ID NO:1 according to the invention for diagnostics by means of DNA chip technology.

The invention is illustrated but not limited by the following examples.

EXAMPLES Example 1

Blood samples. All the blood samples used in this study were collected from HIV-1 subtype C seropositive injection drug users (IDUs) in the national molecular epidemiology survey during 1996-1997 at several HIV epidemic areas in China. Peripheral blood mononuclear cells (PBMCs) were separated by ficoll gradients. Viruses were isolated by cocultivating the PBMCs from seropositive IDUs with phytohemagglutinin (PHA) stimulated donor PBMCs. Positive virus culture was detected from cell culture supernatants by HIV-1 p24 Core Profile ELISA kit (DuPont Inc., Boston, Mass.).

Example 2

Polymerase chain reactions and DNA sequencing. Proviral DNA was extracted from productively infected PBMCs of more than one hundred preselected HIV-1 positive IDUs from the Northwestern provinces of China (Qiagen Inc., Valencia, Calif.). Nested-PCR was used to amplify the envelope C2V3 coding region. PCR products were directly sequenced by Taq-cycle sequencing using fluorescent dye-labeled terminators (Applied Biosystems, 373A, Foster City, Calif.) as previously described (Bai et al. 1997; Yu et al. 1997). Multiple sequence alignments were performed by applying the Wisconsin software package Genetics Computer Group with correction methods of Kimura (GCG, 1997, version 9).

Example 3

Phylogenetic tree analysis of all obtained sequences were performed by using the PHYLIP software package. Evolutionary distances were calculated by the maximum parsimony method and is indicated by cumulative horizontal branch length. The statistical robustness of the neighbour joining tree was tested by bootstrap resampling as described (Graf et al. 1998).

Example 4

Selection of a representative C-clade HIV-1 isolate from Chinese IDUs. The calculated average intra-group distances within the C2V3 coding region were as low as 2.26±1.43 on DNA level, indicating that the epidemic in this area is still very young. Inter-group differences between the Chinese clade C sequences and those of Indian, African and South American origin were 9.67±2.31 (India), 15.02±4.13 Africa and 8.78±3.41 (South America), respectively. This demonstrates a close phylogenetic relationship between Indian and Chinese clade C sequences (Lole et al. 1999) and a substantial genetic distance to the per se relatively heterogeneous group of African clade C HIV-1 strains.

Example 5

Identification of a virus isolate representing best the prevalent clade C virus strain circulating throughout China. From the analyzed specimens, a representative isolate referred to as 97cn54 was identified exhibiting highest homology (99.6%) to a calculated consensus sequence (cn-conV3), which has been established on the basis of the characterized local HIV-sequences (Table 1). Multiple amino acid sequence alignments including primary C-clade representatives V3-loop sequences selected from different epidemic regions as well as consensus sequences of other clades (A-H, O, CPZ) underlined the subtype C character of the selected primary isolate 97cn54 (Table 1). Compared with an overall V3 consensus sequence (consensus), 97cn54 as well as cn-con-c show amino acid alterations at position 13 (H→R) and 19 (A→T), both of which are characteristic for subtype C isolates (C_consensus).

TABLE 1 V3-loop amino acid sequence alignment position 1          11         21         31    38 Consensus CTRPNNNTRK SIHIGPGQAF YA---TGDII GDIRQAHC SEQ ID NO:4 C_94IN11246 ---------- --r-----t- --   --e-v -n------ SEQ ID NO:5 C_93IN905 ---------- --r-----t- --   ----m -------- SEQ ID NO:6 C_93IN999 -vr------e --r-----t- --   --e-- -------- SEQ ID NO:7 C_consensus ---------- --r-----t- --...----- -------- SEQ ID NO:8 C_ind8 ---------- -tr-----t- --...----- -------- SEQ ID NO:9 97cn54-v3 ----g----- --r-----t- --...----- -------- SEQ ID NO:10 cn-con-v3 ----g----- --r-----t- --...----- -------- SEQ ID NO:10 C_bro025 ---------- --r------- --...--e-- -------- SEQ ID NO:11 C_ind1024 ---------- --r-----t- --...----- ----r-y- SEQ ID NO:12 C_nof ---------- r-rv----tv --...-na-- -------- SEQ ID NO:13 C_zam20 -a--g----- --r-----t- f-...--a-- -------- SEQ ID NO:14 C_sm145 ---ya----- -vr-----t- -....-n--- -------- SEQ ID NO:15 A_consensus ---------- -vr------- --...----- -------- SEQ ID NO:16 B_consensus ---------- -------r-- -t...--e-- -------- SEQ ID NO:17 D_consensus ----y----q rt-------l -....-tr-- -------- SEQ ID NO:18 E_consensus ----s----t --t-----v- -r...----- ----k-y- SEQ ID NO:19 F_consensus ---------- ---l------ --...----- ----k--- SEQ ID NO:20 G_consensus ---------- --t------- --...----- -------- SEQ ID NO:21 H_consensus ---------- --s------- --...----- ----k-y- SEQ ID NO:22 O_consensus -e--gidiqe .-r---.m-w -smglg-tng nss-a-y- SEQ ID NO:23

V3 amino acid alignment of consensus sequences from different HIV-1 clades (A-O) and selected subtype C isolates from different countries. The overall V3 consensus sequence was constructed by aligning consensus sequences from different clades (A-O). cn-con-V3 represents the consensus sequence of HIV-1 subtype C strains prevalent in China. 97cn54 has been selected as the standard representative isolate of the most prevalent clade C HIV-1 strains circulating throughout China. “-” indicates no exchange to the V3 consensus sequence, lower case letters indicate an amino acid substitution and “.” indicate gaps. All consensus and isolate sequences for multiple alignments were obtained from the Los Alamos database.

Example 6

The 97cn54 envelope protein coding sequence is most closely related to Indian clade C virus strains. Phylogenetic tree analysis, initially based on the C2V3 sequences of the envelope gene, revealed that both 97cn54 as well as the consensus sequence of chinese clade C isolates cluster to the subtype C strains from India (ind8, d1024, c-93in905, c-93in999, c-93in11246), Africa (c-eth2220, c-ug286a2) and South America (92br025, nof, cam20 and sm145). This suggests that the Indian C-clade virus strains might be the source of the HIV-1 subtype C epidemic in China (FIG. 1). This hypothesis is also in agreement with our early epidemiology reference confirming that the HIV-1 subtype C infected individuals in Yunnan shared the needles with the Indian jewellery businessmen in the boundary area (Shao et al. 1999).

Example 7

Cloning of the virtually full length HIV-1 genome. Virtually full-length HIV-1 genomes were amplified using the Expand Long Template PCR system (Boehringer-Mannheim, Mannheim, Germany) as described (Graf et al. 1998; Salminen et al. 1995). Primers were positioned in conserved regions within the HIV-1 long-terminal repeats (LTR): TBS-A1 (5′-ATC TCT AGC AGT GGC GGC CGA A SEQ ID NO:24) and NP-6 (5′-GCA CTC AAG GCA AGC TTT ATT G SEQ ID NO:25). Purified PCR-fragments were blunt-end ligated into a SrfI digested pCR-Script vector (Stratagene, Heidelberg, Germany) and transformed into E. coli strain DH5α. Several recombinant clones containing virtually full-length HIV-1 genome were identified by restriction fragment length polymorphism (RFLP) analysis and sequencing of the V3-loop coding sequence. According to RFLP analysis, using different combinations of restriction endonucleases, followed by sequencing of the V3-loop coding sequence, 77% of the positive full-length constructs were close to identical. A provirus construct representing the vast majority of the positive clones was selected and sequenced as described above using the primer-walking approach (primers were designed approximately every 300 bp along the genome for both strands).

Example 8

DNA sequences were assembled using Lasergene Software (DNASTAR, Inc, Madison, Wis.) on Macintosh computers. All the reference subtype sequences in this study are from the Los Alamos HIV database. Nucleotide sequence similarities were calculated by the local homology algorithm of Smith and Waterman. Multiple alignments of sequences with available sequence data of other subtypes was performed using the Wisconsin software package Genetics Computer Group (GCG, 1997, version 9).

Example 9

Overall structure of the 97cn54 coding sequence. The 9078 bp genomic sequence derived from isolate 97cn54 contained all known structural and regulatory genes of an HIV-1 genome. No major deletions, insertions or rearrangements were found. Nucleotide sequence similarities were examined by comparing all coding sequences (CDS) of 97cn54 to consensus sequences of different genotypes and selected subtype isolates (Table2). The highest homologies of gag, pol, env and vif reading frames to the corresponding clade-C consensus sequences were within a range of 93.93%-95.06%. This observation considerably extended the above C2V3 based sequence comparison and phylogenetic tree analysis (see Table 1 and FIG. 1) and therefore clearly confirmed the belonging of the selected virus isolate to the group of previously published C-clade virus strains. However, the homology values determined by this kind of analysis for the tat, vpu, vpr and nef genes were not sufficient to allow a clear assignment of these reading frames to clade-B or C virus strains (Table 2). For the vpu gene, the highest homologies were notified to clade-B (94.24%) compared with only 78.23% to a clade-C consensus sequence. Similar observations were made for the tat gene with highest homology to the B′-rl42 isolate (>91%) as compared to 87.9% (C-92br025) and 85.5% (C-eth2220) for selected primary C-clade representatives or 89.01% for the clade-C consensus sequence. These data, together with the occurrence of B, C and E genotypes throughout the epidemic area of Yunnan suggested that the analyzed virus isolate might represent a mosaic virus strain that resulted from a B′/C interclade recombination event.

TABLE 2 Comparison of 97cn54 derived coding sequences with the corresponding genes of reference strains and clade specific consensus sequences percentage identity to 97cn54 CDS gag pol vif vpr tat rev vpu env nef A 87.68 91.80 86.81 83.66 84.90 83.97 79.82 85.75 84.19 B 90.43 91.93 88.04 90.31 86.56 82.08 94.24 84.52 88.13 B-mn 89.38 90.82 86.01 89.31 87.44 79.48 88.21 82.33 85.41 B′-rl42 91.53 90.76 86.01 88.97 91.163 80.23 96.74 82.70 85.99 C 94.65 94.29 95.06 91.39 89.01 91.99 78.23 93.93 88.82 C- 92.19 92.91 88.51 90.03 87.91 89.70 76.13 88.51 86.20 92br025 C- 91.4 92.06 87.15 90.77 85.57 88.08 80.09 87.15 87.08 eth2220 D 89.80 91.08 87.74 87.94 83.93 84.39 87.30 85.26 86.88 E/A 86.324 89.07 86.59 83.39 81.44 81.74 77.31 82.09 84.18 F 88.02 88.99 86.36 86.25 80.65 86.25 82.33 84.02 / G 88.08 / / / / / / 84.55 / H 87.69 89.45 86.01 85.22 / / / 83.74 / O 73.42 78.02 72.12 76.604 72.31 76.60 59.54 67.01 80.35 CPZ 74.14 78.80 93.75 75.44 76.00 75.44 64.41 72.42 /

Nucleotide sequence comparison of all coding sequences (CDS) between 97cn54 and DNA sequences, representing either: (1) consensus sequences of distinct HIV-1 clades (obtained from Las Alamos HIV database) or (2) standard subtype C (92br025 and eth2220) and B (mn and rl42) isolates. The data indicate the percentage identity of a given sequence to 97cn54. Ambiguous nucleotide positions within consensus sequences were scored as a match. The highest degrees of homology are highlighted in boldface. /, no consensus sequence was available from the Los Alamos database.

Example 10

Determination of intersubtype recombinations. Recombinant Identification Program (RIP, version 1.3; http://hiv-web.lanl.gov/tools) was used to identify potential mosaic structures within the full-length sequence of this clone (Window size: 200; Threshold for statistical significance: 90%; Gap handling: STRIP; Informative mode: OFF). Gaps were introduced in order to create the alignment. The background subtypes sequences in this analysis were: u455 (subtype A), RL42 (Chinese subtype B-Thai (B′)), eth2220 (subtype C), z2d2 (subtype D), 93th2 (subtype A/E).

Example 11

Interclade recombination in the Gag-Pol coding region of 97cn54. Albeit substantial homologies to C-clade virus strains were observed within the highly conserved gag and pol reading frames, RIP analysis identified 3 areas of intraclade recombination within gagpol around positions 478-620, 1290-1830 and 2221-2520 upstream of the gag start codon. These dispersed stretches are located within gag and pol reading frames showing highest homology towards prototype B (data not shown) and in particular highest towards a subtype-B(B′) isolate originating from Yunnan (FIG. 2). This observation clearly underlines the importance of RIP analysis, since simple homology alignments based on complete genes were not able to identify these small interspersed fragments of a different subtype. In order to confirm the data obtained by RIP analysis we created several phylogenetic trees using regions either flanking or spanning the stretches of proposed recombination (FIG. 3). Using various standard representatives of different subtypes and some selected C-clade primary isolates all proposed areas of recombination could be confirmed by differential clustering of 97cn54 with the respective C (FIG. 3 A, C, E, G) or B-clade reference isolates (FIG. 3 B, D, F).

Example 12

Interclade recombination in the Env coding region of 97cn54. As expected from the sequence alignments summarized in table 2, the RIP analysis clearly confirmed the intersubtype recombination between subtype (B′)-Thai and C (FIG. 4). A fragment of about 1000 bp extending from 3′ 150 bp of vpr through the first exon of tat and rev to vpu showed the highest degree of homology with the local subtype (B′) representative (rl42) (FIG. 4 A). Furthermore, an about 300 bp sequence stretch overlapping the 5′-half of the nef gene showed highest homology to the (B′)-Thai subtype whereas the remaining part including a 300 bp fragment extending to the 3′-LTR clustered with subtype C (FIG. 4 B).

Extending the RIP analysis, phylogenetic trees showed closest relationship of vpr/vpu and the 5′-portion of the nef gene to clade-B isolates (FIG. 5 A, B), whereas the 3′-nef fragment clearly clustered with subtype C representatives (FIG. 5 C). Further analysis confirmed that the subtype B sequence within this mosaic is more closely related to a very recently described Thai-(B′) strain (rl42) isolated from a Chinese IDU (Graf et al. 1998) than to prototype B isolates (mn and sf2) (table 2).

Example 13

Representative character of 97cn54. Breakpoints located in the vpr/vpu coding region as well as in the nef gene of 97cn54 were found at almost identical positions of all subtype C strains isolated from IDUs living in the Northwestern provinces of China. Two RIP analysis representative for 8 independently isolated and analyzed HIV-1 strains from different HIV-1 infected individuals in the Xinjiang autonomous region are shown in FIG. 4 C and D. Regarding the origins of 97cn54 (southwest of China) and xj24 and xj15 (northwest area), these data suggest a common ancestor for the C/B′ recombinant strains circulating throughout China. In conclusion, our results demonstrate that 97cn54 represents a C/(B′) interclade mosaic virus with 10 breakpoints of intraclade recombination that is most prevalent among the IDUs within the Northwestern provinces of China. A schematic representation of the (B′/C) mosaic genome of isolate 97cn54 is given in FIG. 6.

Example 14

Prediction of cross-clade specific epitopes for HIV specific cytolytic T cells. Genomic sequences offer the opportunity to assess conservation of known CTL epitopes, that may have impact on the efficacy of HIV-1 candidate vaccines. Most reagents and data on CTL epitopes are derived from clade B HIV-1Lai sequences. In order to provide an estimate of cross clade CTL-epitope conservation, the predicted protein sequences of 97cn54 were compared to the known and best mapped LAI specific CTL epitopes. Of 194 reported HIV-1 CTL epitopes, 75, 55, 40 and 24 are located in Gag (p17, p24, p15), in the reverse transcriptase (RT), in gp120 and gp41, respectively. Whereas almost 50% or more of the epitopes in Gag and RT are completely identical, only 5% and 17% of the gp120 and gp41 HIV-1_(LAI) derived CTL epitopes exactly matched the predicted amino acid sequences of 97cn54. However, allowing as much as 2 conservative mismatches in a given CTL epitope, an additional portion of 48% (p17), 33% (p24), 40% (RT), 57% (gp120) and 33% (gp41) of the known HIV-1LAI CTL epitopes was related to the sequences in the corresponding 97cn54 derived polypeptides. Of course, the latter consideration has to be taken with some caution, as even nonconservative changes might abrogate HLA-binding or T-cell receptor recognition of an antigenic peptide. However, taken together, these observations clearly predict a considerable cross-clade CTL reactivity especially regarding the functionally and immunologically conserved HIV-1 proteins. In addition, these data suggest, that a considerable portion of the reagents (peptides, vaccinia virus constructs) that have been synthesized and established for the mapping and characterization of clade B CTL epitopes may be also useful in determining CTL reactivities on the basis of clade C HIV sequences.

TABLE 3 reading frames of 97cn54 coding sequence reading frames start end start end gag  177 1654 pol 1447 4458 env 5589 8168 vif 4403 4984 vpr 4924 5214 vpu 5426 5671 tat 5195 5409 7730 7821 rev 5334 5409 7730 7821 nef 8170 8790

Numbering refers to the 5′ end of the DNA sequence depicted in SEQ ID NO: 1.

Example 15

(A) Description of the synthetic C54 gp160 coding region: C-gp160. The C-gp120 gene was cloned into the unique KpnI/SacI restriction sites of the pCR-Script amp(+) cloning vector (Stratagene, Genbank Accession: U46017). The synthetic C54 gp160 coding region which is codon-optimized to high expressing mammalian genes is set forth in SEQ ID NO:3. The synthetic signal sequence encodes a transport signal for the import of the encoded polypeptide into the endoplasmic reticulum.

Positions of the different coding regions are as follows:

CDS start end synthetic leader 28 87 gp160 88 2580

(B) Description of the synthetic C54 gagpolnef sequence: C-gpnef. The C-gpnef gene was cloned into the unique KpnI/SacI unique restriction sites of the pCR-Script amp(+) cloning vector (Stratagene). The synthetic C54 gagpolnef sequence which is codon-optimized to high expressing mammalian genes is set forth in SEQ ID NO:2. In the present construct the N terminal glycine is replaced by alanine (nucleotide sequence GGC) to prevent a targeting of the polypeptide to the cytoplasm membrane and the following secretion of assembled virus like particles via budding. Simultaneously, a (−1) frame shift was introduced at the naturally frame shift sequence to guarantee an obligatory read through of the ribosomes out of the Gag into the Pol reading frame and, thus, guarantee the synthesis of a GagPolNef polyprotein.

Positions of the different coding regions are as follows:

CDS start end gag 13 1500 5′pol (ΔRT) 1501 2460 scrambled nef 2461 3090 3′pol (ΔIN) 3091 4155 RT active site 4156 4266

Example 16

The GagPolNef polygene encoded by SEQ ID NO: 1 was inserted via a KpnI/XhoI site into the vector pcDNA3.1 and transformed into E. coli strain XLlblue. The capability of the GagPolNef expression vector to induce a Gag specific antibody response was analyzed in female BALB/c mice (FIG. 9). Two groups of 5 animals each received an intramuscular (i.m.) first immunization of each 100 μg DNA per immunization followed by two further i.m. immunizations after 3 and 6 weeks (group 1: pcDNA-GagPolNef; Group 2: pcDNA). A control group (group 3) was immunized with PBS only. The total titer of Gag specific IgG was determined against purified Gag protein by ELISA. The immunization with pcDNA-GagPolNef resulted in a rapid induction of a high titer of Gag specific antibodies (1:4,000) characterized by a typical Th1 profile of antibody isotypes (IgG2a>>IgG1). Both control groups 2 and 3 yielded no evidence for a generation of Gag specific antibodies. The antibody titer increased nearly to the hundredfold (1:20,000) 1 week after the first further immunization and resulted in a Gag specific end titer of 1:80,000 1 week after the second boost. At no time a significant Gag specific antibody response could be verified in the two control groups.

Example 17

The antigen specific cytokine secretion was analyzed from spleen cells each dissected 5 days after the second further immunization as an evidence for the induction of a T helper memory response. The spleen cells of those mice received three i.m. immunizations with pcDNA-GagPolNef responded to the Gag specific antigen stimulus with a significant γIFN secretion (table 3). A comparatively reduced γIFN production was observed in spleen cells which were dissected from mice after triple subcutaneous (s.c.) or intradermal (i.d.) immunization with pcDNA-GagPolNef according to the same schema as above. In all immunization groups no significant IL4 and IL5 secretions from the specific restimulated spleen cells in vitro were determined independently from the immunization route. A cytokine secretion from non stimulated spleen cells was not observed.

According to this, the i.m. immunization with pcDNA-GagPolNef resulted in a strong Th1 cytokine profile whereas the s.c. administration induced a more weakly Th1 response.

TABLE 4 Cytokine profile from in vitro stimulated mouse spleen cells with Gag after immunization (injections with a needle) or i.d. or s.c. immunization with the mentioned DNA constructs by means of a particle gun. DNA vaccine IL-4 (pg/ml) IL-5 (pg/ml) IFN-γ (pg/ml) pcDNA-GagPolNef (i.m.) <8 <16 3220 ± 840 pcDNA-GagPolNef (i.d.) <8 <16  80 ± 32 pcDNA-GagPolNef (s.c.) <8 <16 <32 Mean values ± standard deviation of spleen cells, dissected from 5 mice per experiment

Example 18

To verify the capability of pcDNA-GagPolNef for the inducting of Gag specific CTLs spleen cells were specifically restimulated in vitro 3 weeks after a first immunization with pcDNA-GagPolNef (group 1), pcDNA (group 2) and PBS (group 3) in a mixed lymphocyte tumor cell culture for 6 days and investigated for their cytotoxic activity subsequently. It is known that the nonameric AMQMLKETI peptide (single letter code) derived from the Gag protein of the subtype B virus (IIIB isolate) is a D^(d) restricted CTL epitope in BALB/c mice. Said peptide was used in the experiment to restimulate the specific cytotoxic activity in vitro as well as to determine said activity. Gag specific cytotoxic T cells could be determined after a single i.m. injection with the pcDNA-GagPolNef plasmid but not in the control groups 2 and 3. The treatment of spleen cells with said plasmid did not result in an in vitro priming of Gag specific cytotoxic T cells. These results confirmed (i) the capability of pcDNA-GagPolNef to induce specific cytotoxic T cells which are (ii) subtype spanning active (FIG. 9).

REFERENCES

Bai, X., Su, L., Zhang, Y., and et al (1997). Subtype and sequence analysis of the C2V3 region of gp120 gene among HIV-1 strains in Xinjiang. Chin. J. Virology 13.

Carr, J. K., Salminen, M. O., Koch, C., Gotte, D., Artenstein, A. W., Hegerich, P. A., St Louis, D., Burke, D. S., and McCutchan, F. E.(1996). Full-length sequence and mosaic structure of a human immunodeficiency virus type 1 isolate from Thailand. J. Virol. 70, 5935-5943.

Carr, J. K., Salminen, M. O., Albert, J., Sanders Buell, E., Gotte, D., Birx, D. L., and McCutchan, F. E. (1998). Full genome sequences of human immunodeficiency virus type 1 subtypes G and A/G intersubtype recombinants. Virology 247, 22-31

Esparza, J., Osmanov, S., and Heyward, W. L. (1995). HIV preventive vaccines. Progress to date. Drugs 50, 792-804.

Expert group of joint United Nations programme on HIV/AIDS (1999). Implications of HIV variability for transmission: scientific and policy issues. AIDS 11, UNAIDS 1-UNAIDS 15.

Gao, F., Robertson, D. L., Morrison, S. G., Hui, H., Craig, S., Decker, J., Fultz, P. N., Girard, M., Shaw, G. M., Hahn, B. H., and Sharp, P. M. (1996). The heterosexual human immunodeficiency virus type 1 epidemic in Thailand is caused by an intersubtype (A/E) recombinant of African origin. J. Virol. 70, 7013-7029.

Gao, F., Robertson, D. L., Carruthers, C. D., Morrison, S. G., Jian, B., Chen, Y., Barre Sinoussi, F., Girard, M., Srinivasan, A., Abimiku, A. G., Shaw, G. M., Sharp, P. M., and Hahn, B. H. (1998). A comprehensive panel of near-full-length clones and reference sequences for non-subtype B isolates of human immunodeficiency virus type 1. J. Virol. 72, 5680-5698.

Gaywee, J., Artenstein, A. W., VanCott, T. C., Trichavaroj, R., Sukchamnong, A., Amlee, P., de Souza, M., McCutchan, F. E., Carr, J. K., Markowitz, L. E., Michael, R., and Nittayaphan, S. (1996). Correlation of genetic and serologic approaches to HIV-1 subtyping in Thailand. J. Acquir. Immune. Defic. Syndr. Hum. Retrovirol. 13, 392-396.

Graf, M., Shao, Y., Zhao, Q., Seidl, T., Kostler, J., Wolf, H., and Wagner, R. (1998). Cloning and characterization of a virtually full-length HIV type 1 genome from a subtype B′-Thai strain representing the most prevalent B-clade isolate in China. AIDS Res. Hum. Retroviruses 14, 285-288.

Graham, B. S. and Wright, P. F. (1995). Candidate AIDS vaccines. N. Engl. J. Med. 333, 1331-1339.

Kostrikis, L. G., Bagdades, E., Cao, Y., Zhang, L., Dimitriou, D., and Ho, D. D. (1995). Genetic analysis of human immunodeficiency virus type 1 strains from patients in Cyprus: identification of a new subtype designated subtype I. J. Virol. 69, 6122-6130.

Leitner, T. and Albert, J. (1995). Human Retroviruses and AIDS 1995: a compilation and analysis of nucleic acid and amino acid sequences. (Myers, G., Korber, B., Wain-Hobson, S., Jeang, K., Mellors, J., McCutchan, F., Henderson, L., and Pavlakis, G. Eds.) Los Alamos National Laboratory, Los Alamos, N. Mex. III147-III150.

Lole, K. S., Bollinger, R. C., Paranjape, R. S., Gadkari, D., Kulkami, S. S., Novak, N. G., Ingersoll, R., Sheppard, H. W., and Ray, S. C. (1999). Full-length human immunodeficiency virus type 1 genomes from subtype C-infected seroconverters in India, with evidence of intersubtype recombination. J. Virol. 73, 152-160.

Loussert Ajaka, I., Chaix, M. L., Korber, B., Letoumeur, F., Gomas, E., Allen, E., Ly, T. D., Brun Vezinet, F., Simon, F., and Saragosti, S. (1995). Variability of human immunodeficiency virus type 1 group O strains isolated from Cameroonian patients living in France. J. Virol. 69, 5640-5649.

Luo, C. C., Tian, C., Hu, D. J., Kai, M., Dondero, T., and Zheng, X. (1995). HIV-1 subtype C in China [letter]. Lancet 345, 1051-1052.

Myers, G., Korber, B., Foley, B., Jeang, K. T., Mellors, J. W., and Wain Hobson, S. (1996). Human retroviruses and AIDS: a compilation and analysis of nucleic acid and amino acid sequences. (Anonymous Theoretical Biology and Biophysics Group, Los Alamos, N. Mex.

Salminen, M. O., Koch, C., Sanders Buell, E., Ehrenberg, P. K., Michael, N. L., Carr, J. K., Burke, D. S., and McCutchan, F. E. (1995). Recovery of virtually full-length HIV-1 provirus of diverse subtypes from primary virus cultures using the polymerase chain reaction. Virology 213, 80-86.

Shao, Y., Zhao, Q., Wang B., and et al (1994). Sequence analysis of HIV env gene among HIV infected IDUs in Yunnan epidemic area of China. Chin. J. Virology 10, 291-299.

Shao, Y., Su, L., Sun, X., and et al (1998). Molecular Epidemiology of HIV infection in China. 12th world AIDS conference, Geneva 13132, (Abstract)

Shao, Y., Guan, Y., Zhao, Q., and et al (1999). Genetic variation and molecular epidemiology of the Ruily HIV-1 strains of Yunnan in 1995. Chin. J. Virol. 12, 9.

Sharp, P. M., Robertson, D. L., and Hahn, B. H. (1995). Cross-species transmission and recombination of ‘AIDS’ viruses. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 349, 41-47.

Sharp, P. M., Bailes, E., Robertson, D. L., Gao, F., and Hahn, B. H. (1999). Origins and evolution of AIDS viruses. Biol. Bull. 196, 338-342.

World Health Organisation Network for HIV Isolation and Characterization (1994). HIV-1 variation in WHO-sponsored vaccine-evaluation sites:genetic screening, sequence analysis and preliminary biological characterization of selected viral strains. AIDS Res. Hum. Retroviruses 10, 1327-1344.

Yu, H., Su, L., and Shao, Y. (1997). Identification of the HIV-1 subtypes by HMA and sequencing. Chin. J. Epidemiol. 18, 201-204. 

1. A polynucleotide comprising the sequence of SEQ ID NO:3.
 2. A bacterial or viral vector comprising the polynucleotide of claim
 1. 3. A composition comprising the polynucleotide of claim 1 and a pharmaceutically acceptable carrier.
 4. A method of inducing an immune response in a subject comprising administering to said subject an expression construct comprising (a) an HIV env polynucleotide sequence consisting of SEQ ID NO:3 or a fragment thereof consisting of 27 contiguous bases from SEQ ID NO:3 and (b) a promoter, wherein said promoter is active in cells of said subject.
 5. A eukaryotic packaging cell line transformed with a polynucleotide comprising the sequence of SEQ ID NO:3. 